Cyber security using one or more models trained on a normal behavior

ABSTRACT

Disclosed herein is a method for detection of a cyber-threat to a computer system. The method is arranged to be performed by a processing apparatus. The method comprises receiving input data associated with a first entity associated with the computer system, deriving metrics from the input data, the metrics representative of characteristics of the received input data, analysing the metrics using one or more models, and determining, in accordance with the analysed metrics and a model of normal behavior of the first entity, a cyber-threat risk parameter indicative of a likelihood of a cyber-threat. A computer readable medium, a computer program and a threat detection system are also disclosed.

FIELD OF INVENTION

A method for detection of a cyber-threat to a computer system isdisclosed. More specifically, but not exclusively, a method is disclosedcomprising determining a likelihood of a threat in accordance withreceived input data relating to activity on the computer system and amodel of normal behavior associated with the computer system.

BACKGROUND TO THE INVENTION

The Internet has transformed the way we live our lives and the waybusiness operates. It has transformed the communications of governmentand official organizations to the extent where its criticality is not indoubt. We have an open and global interconnected economy, and the trendis irreversible. The reliance on digital infrastructure to deliver ourcore business processes has presented significant risk, makingvulnerable our most precious commodity: our data, intellectual property,reputation and increasingly our connected critical nationalinfrastructure.

The past few years have seen an exponential rise in the number ofcyber-attacks affecting the networks of businesses of all sizes, and inall sectors. No one is immune to the threat, and whilst the cost toeconomies is significant, the cost to individual businesses can becatastrophic. The UK's GCHQ reports that 1,000 cyber-attacks areperformed every hour in Britain and 33,000 malicious emails blockedevery month containing sophisticated malware. In the U.S., the number ofcyber intrusions has increased by seventeen times, according to thechairman of the Joint Chiefs of Staff, General Martyn Dempsey.

The US Department of Homeland Security has revealed that twenty-threeattacks have been carried out against companies connected to the US gaspipeline alone. The US Defense Science Board reported that hackers hadgained access to over twenty-four major military systems, comprisingtop-secret information, including the designs of anti-ballistic missilesand fighter jets. 93% of large corporations and three quarters of smallbusinesses in the UK are estimated to have had a cybersecurity breach inthe past year. So pernicious has the threat become that, by the end of2012, over half of European Union and NATO member nations had adopted anational cyber strategy.

The international press reports state-sponsored cyber-attacks representa growing and significant threat, with countries looking to gaincompetitive commercial or political advantage through either stealingdata, or compromising and disrupting key commercial, industrial andeconomic assets. Yet the threat to commercial companies is far broaderthan this. Malicious ‘insiders’ are difficult to detect throughconventional methods, as they use legitimate and authorized access tonetworks to facilitate the theft of critical data and IP. Data lossthrough negligent behavior (laptops lost, public devices left logged-onto host networks, etc.) remains a threat. In 2012, a major bank lost thedetails of over 250,000 customers, including names, addresses, dates ofbirth and Social Security numbers, when unencrypted back-up tapes werelost in transit. Increasing numbers of attacks are delivered againstexecutives travelling to high-risk countries with little or no awarenessof either the threat or behavioral mitigations.

Organizations today are faced with more complex data, in higher andhigher volumes, and the commercially viable timescales that determineits use and value are getting shorter. Additionally, faced with arapidly-changing technology base, business is having to engage with andintegrate a wide range of increasingly disruptive technologies, such asmobile and cloud-based computing, BYOD (Bring Your Own Device), and adiverse range of social media tools and technologies, just to remaincompatible with peers. These technologies must be integrated and offeredto staff and customers in relatively short time-scales. The challengethat they represent requires a fundamental shift in traditionalperceptions of information security. Organizations are criticallydependent on the flow of data between disparate parts of theirorganizations, to a mobile workforce, and to customers who demandefficient IT services. As a consequence enterprise boundaries havebecome electronically porous, dynamic and ill-defined. The conventionalIT security model that relies on strict border/gateway control, which isanalogous to the historical physical defensive methods of moats anddrawbridges to keep attackers out, has by universal consensus brokendown. By this convention the IT security industry spends considerableeffort trying to police the perimeters of the corporate network, andprotect it from unauthorized access. The dominant instantiation of thisparadigm has been the regular expression driven SIEM (SecurityInformation and Event Management) and signature driven endpoint productsin a proliferation of forms.

These forms including many which restrict users' access to the networkaccording to a defined set of corporate security policies. The reality,however, is that many, if not all, large corporate networks are likelyto have already been compromised, and that malicious actors, eitherexternal or insider, have actively been targeting data. Today'ssocially-engineered threats, Advanced Persistent Threats and insiderattacks by definition cannot simply be locked out. Data now needsprotecting in the wild and can no longer exist behind high walls.

Deterministic approaches to threat detection have therefore been taken.Such traditional deterministic approaches are reliant on the assumptionthat the difference between what is legitimate activity and what isillegitimate activity can be defined. An expanding set of corporatepolicy and rules have been written to identify client programs as eithercompliant or not compliant. However, such deterministic approachesrequire a vast amount of effort to be spent in constantly updating theserules and signatures, in an attempt to stay up to date with the changingthreat environment. The definition of what is legitimate is based onwhat we know about past attacks. For each successful intrusion, a newrule is written, the signature is updated and the next time that samethreat presents itself, access is denied or challenged. This method iseffective in defending against known threats that have already beenidentified. However, it is incapable of responding to fresh threats thatare constantly presenting either through minor adjustments to existingvectors or more significant evolution of attack methods. Consequently,current threat detection and prevention systems are still beingcompromised and are unable to keep out threats.

Furthermore, as the technical defenses that protect our data have becomemore sophisticated, attackers have increasingly turned their attentionto the softest part of the network, the user. Socially-engineeredattacks account for over 85% of espionage threat, and were thefastest-growing attack vector of 2012. Attackers use subtle andsophisticated methods to manipulate human users to unwittingly installattacks on their behalf, and traditional technical defenses are renderedimpotent within a process where a legitimate user has decided to clickon a link in a ‘weaponized’ email or visited a familiar yet nowcompromised website. These new forms of attack make the problem ofdetecting threats even harder.

SUMMARY OF INVENTION

Disclosed herein is a more intelligent approach to cyber-threatdetection, which embraces the nature of mobile data and porous networks.This new approach is realistic and informed about the threat, and seeksto recognize quiet threats in a noisy environment. Several keyassumptions have been challenged in order to understand why knowledge ofyesterday's attacks will not defend against those of tomorrow.

It has firstly been recognized that relying on identifying threats basedon rules associated with previously identified threats is notsufficiently secure. Consequently, a completely new approach has beendeveloped in order to overcome the problems of such traditionaltechniques.

It has also been accepted that real human beings are at either end ofthe attack chain, which allows for a more considered approach todetecting the idiosyncratic human factors that enable cyber threats.Shifting the perception of the threat allows solutions to be createdthat take as a base assumption the presence of the threat. A method andsystem is therefore provided in this disclosure that can detect subtleshifts and patterns, changes in behavior both of humans and machines,and highlight hitherto undetected threats.

The indicators of compromise are often subtle and complex. To take theinsider threat, according the CERT Insider Threat Center at the CarnegieMellon University Software Engineering Institute, insiders that areintent on, or considering, malicious attacks often exhibit identifiablecharacteristics and/or warning signs before engaging in those acts.These signs are often hard to detect as they are often no more thansubtle distortions of normal and legitimate behavior. For example thecontractor checking into the office building after hours and using thephotocopier, a group of files being transferred under a new employeeaccount or a specific email communication being sent out of workinghours—such weak pieces of information individually can, seen together,form indicators of mal-intent. These small indicators call for anintelligent approach that is able to see patterns in the information andactivity and build an understanding of what is normal at any one time,and what is genuinely anomalous, based on the current threat and networkenvironment. To be operationally viable any such technology must notproduce large numbers of false positives currently associated withtraditional network intrusion detection systems.

The method and system disclosed herein enable automatic probabilisticreal-time detection of cyber threat or compromise to computers and/ornetworks through changes in the computers and/or networks' behavior. Themethod and system also permits the automatic detection of human insiderstaff threat as their potentially malicious behavior is reflected inchanges to the pattern of usage in networked office equipment. This isachieved mathematically without any prior knowledge of the threat typeso that both established and entirely novel forms of cyber threat vectorcan be detected without the need for loading pre-determined signature,rules or antiviral updates etc. The result is a novel passive networkand host-wide defensive and surveillance capability.

The method and system disclosed herein have a core Bayesianprobabilistic model, referred to as the Hyper Cylinder model. Threeadditional concepts, which are governed by the model are then utilized.These additional concepts may operate independently or in concert witheach other. The core model permits automatic detection of cyber threatsthrough probabilistic change in behavior of normal computers andcomputer networks. This is achieved with a programmatic application ofan unsupervised mathematical model used for detecting behavioral change.‘Normal’ is not a predefined entity but established dynamically withinthe model as a function of the real-world activity of each node and ornetwork.

The additional concepts include: 1) the ability to chain sets of userdefined heuristics across a computer network and then dynamicallymodulate their application relative to the governing unsupervisedprobabilistic ‘Hyper Cylinder model; 2) Automatic network wide,self-organising, 3D projection of cyber threat across packet flow,connection topology and changing endpoint attributes; and 3) Detectionof ICS and SCADA cyber compromise through contrasting mathematicalmodels of normal device behavior at protocol layer.

The threat detection system disclosed herein may take input from probesin the form of a number of metrics. The metrics may be derived from anenvironment defined selected combination of: the analysis of full TCP/IPnetwork packet data inspection, third party log file ingestion, endpointfile system and OS parameters, network-level meta data leveltechnologies such as NetFlow and IP Flow Information Export (IPFIX),building power and physical security metrics and data flows etc. Theseinputs may be converted into a normative model of individual devices onthe network and the overall topological ‘shape’ of the devices externaland internal communications.

In accordance with an aspect of the invention there is provided a methodfor detection of a cyber-threat to a computer system, the methodarranged to be performed by a processing apparatus, the methodcomprising receiving input data associated with a first entityassociated with the computer system, deriving metrics from the inputdata, the metrics representative of characteristics of the receivedinput data, analysing the metrics using one or more models, anddetermining, in accordance with the analysed metrics and a model ofnormal behavior of the first entity, a cyber-threat risk parameterindicative of a likelihood of a cyber-threat.

The method may further comprise updating the model of normal behavior ofthe first entity in accordance with the analysis of the metrics. Thereceived input data may include data relating to activity on thecomputer system associated with the first entity. The derived metricsmay reflect a usage of the computer system by the first entity over aperiod of time. The derived metrics may be network traffic relatedmetrics associated with activity of the first entity on the computersystem. The derived metrics may be derived from header analysis on anInternet Layer protocol level of the computer system. The method mayfurther comprise selecting the plurality of metrics from a range ofpossible metrics before deriving the plurality of metrics. The one ormore models may include a first model arranged to analyse data fordetecting a first type of threat. The one or more models may include asecond model arranged to analyse data for detecting a second type ofthreat. The cyber-threat risk parameter may be a probability of thelikelihood of a threat. The probability may be determined using arecursive Bayesian estimation. The method may further comprisedetermining whether or not there is a threat by comparing thecyber-threat risk parameter with a threshold. The threshold may be amoving threshold. The cyber-threat risk parameter may be determined bycomparing the analysed metrics with the model of normal behavior of thefirst entity. The method may further comprise predicting an expectedbehavior of the first entity based on the model of normal behavior. Thedetermining the cyber-threat risk parameter may comprise comparing theanalysed metrics with the expected behavior. The method may furthercomprise receiving input data associated with a second entity, whereinthe determining the cyber-threat risk parameter comprises taking theinput data associated with the second entity into consideration. Theentity may be a user of the computer system. The entity may be a deviceforming part of the computer system. The entity may form part of anindustrial control system connected to or forming part of the computersystem. The computer system may be a network of computing devices.

According to another aspect of the invention there is provided acomputer readable medium comprising computer readable code operable, inuse, to instruct a computer to perform any method disclosed herein.

According to a further aspect of the invention there is provided acomputer program comprising computer readable code operable, in use, toinstruct a computer to perform any method disclosed herein.

According to yet another aspect of the invention there is provided athreat detection system comprising a processor and a memory comprisingcomputer readable code operable, in use, to instruct the processor toperform any method disclosed herein.

The computer system may be a single computer. Alternatively, thecomputer system may be a network of computers and/or other electronicdevices. The computer system may also be a collection of networks. Whenthe computer system is a single computer, the processing apparatus maybe a processor of the computer. The processing apparatus may be part ofthe computer system.

The method may further comprise developing a pattern of life for a userin the form of a model. The may further comprise developing the patternof life based on various data gathered regarding the user.

The method may involve automatic real-time cyber-threat detection. Themodel may be an unsupervised mathematical model. The model may be anormative model. The model may be a self-learning model. Probes mayanalyse the metrics. The system may have a non-frequentist architecture.The model may be ever-changing. The model may comprise user definedheuristics. The model may be updated when new data is received. Themodel may be updated when new data is received that is deemed within thelimits of normal behavior.

Activity by the user may comprise interactions with other entities ofthe computer system. When the entity is a user (i.e. user account) theseinteractions may be with other users or with devices forming part of thecomputer system.

The metrics may represent characteristics of the received input data.The characteristics may include quantification of the input data. Thecharacteristics may include simplification of the input data.

The threshold used for determining if there is a threat may be a movingthreshold. The moving threshold may be varied according to changes inthe computer system.

The presence of unexpected behavior may be indicative of a threat. Theabsence of expected behavior may be indicative of a threat.

The determining of the cyber-threat risk parameter taking the input dataassociated with the second entity into consideration may involveanalysing causal links between data associated with the first entity anddata associated with the second entity. The link between data associatedwith any number of entities on the computer system may be taken intoconsideration when performing the threat detection method.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention shall now be described withreference to the drawings in which:

FIG. 1 illustrates a network of computer systems 100 using a threatdetection system according to an embodiment of the invention; and

FIG. 2 illustrates a flow diagram of the process carried out by thethreat detection system for automatic detection of cyber threats.

Throughout the description and the drawings, like reference numeralsrefer to like parts.

SPECIFIC DESCRIPTION

FIG. 1 illustrates a network of computer systems 100 using a threatdetection system according to an embodiment of the invention. The systemdepicted by FIG. 1 is a simplified illustration which is provided forease of explanation of the invention.

The system 100 comprises a first computer system 10 within a building,which uses the threat detection system to detect and thereby attempt toprevent threats to computing devices within its bounds. The firstcomputer system 10 comprises three computers 1, 2, 3, a local server 4,and a multifunctional device 5 that provides printing, scanning andfacsimile functionalities to each of the computers 1, 2, 3. All of thedevices within the first computer system 10 are communicatively coupledvia a Local Area Network 6. Consequently, all of the computers 1, 2, 3are able to access the local server 4 via the LAN 6 and use thefunctionalities of the MFD 5 via the LAN 6.

The LAN 6 of the first computer system 10 is connected to the Internet20, which in turn provides computers 1, 2, 3 with access to a multitudeof other computing devices including server 30 and second computersystem 40. Second computer system 40 also includes two computers 41, 42,connected by a second LAN 43.

In this exemplary embodiment of the invention, computer 1 on the firstcomputer system 10 has the threat detection system and therefore runsthe threat detection method for detecting threats to the first computersystem. As such, it comprises a processor arranged to run the steps ofthe process described herein, memory required to store informationrelated to the running of the process, as well as a network interfacefor collecting the required information. This method shall now bedescribed in detail with reference to FIG. 1.

The computer 1 builds and maintains a dynamic, ever-changing model ofthe ‘normal behavior’ of each user and machine within the system 10. Theapproach is based on Bayesian mathematics, and monitors allinteractions, events and communications within the system 10—whichcomputer is talking to which, files that have been created, networksthat are being accessed.

For example, computer 2 is based in a company's San Francisco office andoperated by a marketing employee who regularly accesses the marketingnetwork, usually communicates with machines in the company's U.K. officein second computer system 40 between 9.30 am and midday, and is activefrom about 8.30 am until 6 pm. The same employee virtually neveraccesses the employee time sheets, very rarely connects to the company'sAtlanta network and has no dealings in South-East Asia. The threatdetection system takes all the information that is available relating tothis employee and establishes a ‘pattern of life’ for that person, whichis dynamically updated as more information is gathered. The ‘normal’model is used as a moving benchmark, allowing the system to spotbehavior on a system that seems to fall outside of this normal patternof life, and flags this behavior as anomalous, requiring furtherinvestigation.

The threat detection system is built to deal with the fact that today'sattackers are getting stealthier and an attacker may be ‘hiding’ in asystem to ensure that they avoid raising suspicion in an end user, suchas by slowing their machine down, using normal software protocol. Anyattack process thus stops or ‘backs off’ automatically if the mouse orkeyboard is used. However, yet more sophisticated attacks try theopposite, hiding in memory under the guise of a normal process andstealing CPU cycles only when the machine is active, in an attempt todefeat a relatively-simple policing process. These sophisticatedattackers look for activity that is not directly associated with theuser's input. As an APT (Advanced Persistent Threat) attack typicallyhas very long mission windows of weeks, months or years, such processorcycles can be stolen so infrequently that they do not impact machineperformance. But, however cloaked and sophisticated the attack is, therewill always be a measurable delta, even if extremely slight, in typicalmachine behavior, between pre and post compromise.

This behavioral delta can be observed and acted on with the novel formof Bayesian mathematical analysis used by the threat detection systeminstalled on the computer 1.

The threat detection system has the ability to self-learn and detectnormality in order to spot true anomalies, allowing organizations of allsizes to understand the behavior of users and machines on their networksat both an individual and group level. Monitoring behaviors, rather thanusing predefined descriptive objects and/or signatures, means that moreattacks can be spotted ahead of time and extremely subtle indicators ofwrongdoing can be detected. Unlike traditional endpoint defenses, aspecific attack type or new malware does not have to have been seenfirst before it can be detected. A behavioral defense approachmathematically models both machine and human activity behaviorally, atand after the point of compromise, in order to predict and catch today'sincreasingly sophisticated cyber-attack vectors. It is thus possible tocomputationally establish what is normal, in order to then detect whatis abnormal.

The threat detection system shall now be described in further detailwith reference to FIG. 2, which provides a flow diagram of the processcarried out by the threat detection system for automatic detection ofcyber threats through probabilistic change in normal behavior throughthe novel application of an unsupervised Bayesian mathematical model todetect behavioral change in computers and computer networks.

The core threat detection system is termed the ‘hyper cylinder’. Thehyper cylinder is a Bayesian system of automatically determiningperiodicity in multiple time series data and identifying changes acrosssingle and multiple time series data for the purpose of anomalousbehavior detection.

Human, machine or other activity is modelled by initially ingesting datafrom a number of sources at step S1 and deriving second order metrics atstep S2 from that raw data. The raw data sources include, but are notlimited to:

Raw network IP traffic captured from an IP or other network TAP or SPANport

Machine generated log files.

Building access (“swipe card”) systems.

IP or non IP data flowing over an Industrial Control System (ICS)distributed network

Individual machine, peripheral or component power usage.

Telecommunication signal strength.

Machine level performance data taken from on-host sources (CPUusage/memory usage/disk usage/disk free space/network usage/etc)

From these raw sources of data, a large number of metrics can be derivedeach producing time series data for the given metric. The data arebucketed into individual time slices (for example the number observedcould be counted per 1 second, per 10 seconds or per 60 seconds), whichcan be combined at a later stage where required to provide longer rangevalues for any multiple of the chosen internal size. For example if theunderlying time slice chosen is 60 seconds long and thus each metrictime series stores a single value for the metric every 60 seconds, thenany new time series data of a fixed multiple of 60 seconds (120 seconds,180 seconds, 600 seconds etc) can be computed with no loss of accuracy.Metrics are chosen directly and fed to the hyper cylinder by a lowerorder model which reflects some unique underlying part of the data, andwhich can be derived from the raw data with particular domain knowledge.The metrics that are obtained depends on the threats that the system islooking for. In order to provide a secure system it is common for alarge number of metrics relating to a wide range of potential threats tobe obtained.

The actual metrics used are largely irrelevant to the Hyper Cylindersystem which is described here, but some examples are provided below.

Metrics derived from network traffic could include data such as:

The number of bytes of data entering or leaving a networked device pertime interval.

Probe output such as—File access change point.

Invalided SSL certification.

Failed authorisation attempt.

Email access patterns.

In the case where TCP, UDP or other Transport Layer IP protocols areused over the IP network, and in cases where alternative Internet Layerprotocols are used (e.g. ICMP, IGMP), knowledge of the structure of theprotocol in use and basic packet header analysis can be utilized togenerate further metrics, such as:

The number of multicasts per time interval originating from a networkeddevice and intended to reach publicly addressable IP ranges.

The number of internal link-local IP Broadcast requests originating froma networked device.

The size of the packet payload data.

The number of individual TCP connections made by a device, or datatransferred by a device, either as a combined total across alldestinations or to any definable target network range, (e.g. a singletarget machine, or a specific network range)

In the case of IP traffic, in the case where the Application Layerprotocol can be determined and analysed, further types of time seriesmetric can be defined, for example:

The number of DNS requests a networked device generates per timeinterval, again either to any definable target network range or intotal.

The number of SMTP, POP or IMAP logins or login failures a machinegenerates per time interval.

The number of LDAP logins or login failures a generated.

Data transferred via file sharing protocols such as SMB, SMB2, FTP, etc

Logins to Microsoft Windows Active Directory, SSH or Local Logins toLinux or Unix Like systems, or other authenticated systems such asKerberos.

The raw data required to obtain these metrics may be collected via apassive fiber or copper connection to the networks internal switch gear.Ideally the system receives a copy of each internal packet via a SPANingconnection.

For other sources, a number of domain specific time series data arederived, each chosen to reflect a distinct and identifiable facet of theunderlying source of the data, which in some way reflects the usage orbehavior of that system over time.

Many of these time series data are extremely sparse, and have the vastmajority of data points equal to 0. Examples would be employee's usingswipe cards to access a building or part of a building, or user'slogging into their workstation, authenticated by Microsoft WindowsActive Directory Server, which is typically performed a small number oftimes per day. Other time series data are much more populated, forexample the size of data moving to or from an always-on Web Server, theWeb Servers CPU utilisation, or the power usage of a photocopier.

Regardless of the type of data, it is extremely common for such timeseries data, whether originally produced as the result of explicit humanbehavior or an automated computer or other system to exhibitperiodicity, and have the tendency for various patterns within the datato recur at approximately regular intervals. Furthermore, it is alsocommon for such data to have many distinct but independent regular timeperiods apparent within the time series.

At step S3, probes carry out analysis of the second order metrics.Probes are discrete mathematical models that implement a specificmathematical method against different sets of variables with the targetnetwork. For example, HMM may look specifically at the size andtransmission time of packets between nodes. The probes are provided in ahierarchy which is a loosely arranged pyramid of models. Each probe ormodel effectively acts as a filter and passes its output to anothermodel higher up the pyramid. At the top of the pyramid is the HyperCylinder which is the ultimate threat decision making model. Lower orderprobes each monitor different global attributes or ‘features’ of theunderlying network and or computers. These attributes consist of valueover time for all internal computational features such as packetvelocity and morphology, end point file system values, and TCP/IPprotocol timing and events. Each probe is specialised to record and makedecisions on different environmental factors based on the probes owninternal mathematical model such as an HMM.

While the threat detection system may be arranged to look for anypossible threat, in practice the system may keep watch for one or morespecific threats depending on the network in which the threat detectionsystem is being used. For example, the threat detection system providesa way for known features of the network such as desired compliance andHuman Resource policies to be encapsulated in explicitly definedheuristics or probes that can trigger when in concert with set or movingthresholds of probability abnormality coming from the probabilitydetermination output. The heuristics are constructed using complexchains of weighted logical expressions manifested as regular expressionswith atomic objects that are derived at run time from the output of datameasuring/tokenizing probes and local contextual information. Thesechains of logical expression are then stored in and/or on onlinelibraries and parsed in real-time against output from themeasures/tokenizing probes. An example policy could take the form of“alert me if any employee subject to HR disciplinary circumstances(contextual information) is accessing sensitive information (heuristicdefinition) in a manner that is anomalous when compared to previousbehavior (Hyper Cylinder output)”. In other words, different arrays ofpyramids of probes are provided for detecting particular types ofthreats.

The analysis performed by the probes on the second order metrics thenoutputs data in a form suitable for use with the model of normalbehavior. As will be seen, the data is in a form suitable for comparingwith the model of normal behavior and for updating the model of normalbehavior.

At step S4, the threat detection system computes a threat risk parameterindicative of a likelihood of there being a threat using automatedadaptive periodicity detection mapped onto observed behavioralpattern-of-life analysis. This deduces that a threat over time existsfrom a collected set of attributes that themselves have shown deviationfrom normative collective or individual behavior. The automated adaptiveperiodicity detection uses the period of time the hyper cylinder hascomputed to be most relevant within the observed network and ormachines. Furthermore, the pattern of life analysis identifies how ahuman and/or machine behaves over time, i.e. when they typically startand stop work. Since these models are continually adapting themselvesautomatically they are inherently harder to defeat than known systems.

The threat risk parameter is a probability of there being a threat incertain arrangements. Alternatively, the threat risk parameter is avalue representative of there being a threat which is compared againstone or more thresholds indicative of the likelihood of a threat.

In practice, the step of computing the threat involves comparing currentdata collected in relation to the user with the model of normal behaviorof the user. The current data collected relates to a period in time,this could be in relation to a certain influx of new data or a specifiedperiod of time from a number of seconds to a number of days. In somearrangements, the system is arranged to predict the expected behavior ofthe system. The expected behavior is then compared with actual behaviorin order to determine whether there is a threat.

In order to improve the accuracy of the system a check can be carriedout in order to compare current behavior of a user with associatedusers, i.e. users within a single office. For example, if there is anunexpectedly low level of activity from a user, this may not be due tounusual activity from the user, but could be due to a factor affectingthe office as a whole. Various other factors can be taken into accountin order to assess whether or not abnormal behavior is actuallyindicative of a threat.

Finally, at step S5 a determination is made, based on the threat riskparameter, as to whether further action need be taken regarding thethreat. This determination may be made by a human operator after beingpresented with a probability of there being a threat, or an algorithmmay make the determination, e.g. by comparing the determined probabilitywith a threshold.

In one arrangement, given the unique global input of the Hyper Cylinder,a novel form of threat visualisation is provided in which the user canview the threat landscape across all internal traffic and do so withoutneeding to know how their internal network is structured or populatedand in such a way as a ‘universal’ representation is presented in asingle pane no matter how large the network. A topology of the networkunder scrutiny is projected automatically as a graph based on devicecommunication relationships via an interactive 3D remote observerperspective interface. The projection is able to scale linearly to anynode scale without prior seeding or skeletal definition.

The threat detection system that has been discussed above thereforeimplements a propriety form of recursive Bayesian estimation to maintaina distribution over the probability state variable. This distribution isbuilt from the complex set of low-level host, network and trafficobservations or ‘features’. These features are recorded iteratively andprocessed in real time on the platform. A plausible representation ofthe relational information among entities in dynamic systems in general,such as an enterprise network, a living cell or a social community, orindeed the entire internet, is a stochastic network, which istopologically rewiring and semantically evolving over time. In manyhigh-dimensional structured I/O problems, such as the observation ofpacket traffic and host activity within an enterprise LAN or WAN, whereboth input and output can contain tens of thousands, sometimes evenmillions of interrelated features (data transport, host-web-clientdialogue, log change and rule trigger, etc.), learning a sparse andconsistent structured predictive function is challenged by a lack ofnormal distribution. To overcome this, the threat detection systemconsists of a data structure that decides on a rolling continuum ratherthan a stepwise method in which recurring time cycles such as theworking day, shift patterns and other routines are dynamically assigned.Thus providing a non-frequentist architecture for inferring and testingcausal links between explanatory variables, observations and featuresets. This permits an efficiently solvable convex optimization problemand yield parsimonious models. In such an arrangement, the threatdetection processing may be triggered by the input of new data.Alternatively, the threat detection processing may be triggered by theabsence of expected data. In some arrangements, the processing may betriggered by the presence of a particular actionable event.

In a further arrangement, the system permits behavioral abnormalitydetection within the interactions and, crucially, state of entitieswithin Industrial Control Systems (ICS) whether on-site (e.g. DCS/PLC)and geographically dispersed in the field (e.g. SCADA). This is achievedby combining the Hyper Cylinder with a bespoke smart thresholding ICSprotocol aware programmable probe. This creates individual machinemodels of all entities (e.g. actuator, thermostat, etc.) and remainseffective in a mixed-technology environment where a range of industrialcontrol protocols and transport mechanisms are deployed, some of whichmay be rare and/or proprietary. The objective of the modelling andmonitoring is to identify and allow response to attacks that may enablemalicious control by an unapproved actor, loss of operator view,manipulation of operator view, and denial of operator control. Thesystem could therefore be applied to any other control of remote devicesfrom aircraft to military systems, as well as the Internet of Things.

The various methods described above may be implemented by a computerprogram product. The computer program product may include computer codearranged to instruct a computer to perform the functions of one or moreof the various methods described above. The computer program and/or thecode for performing such methods may be provided to an apparatus, suchas a computer, on a computer readable medium or computer programproduct. The computer readable medium may be transitory ornon-transitory. The computer readable medium could be, for example, anelectronic, magnetic, optical, electromagnetic, infrared, orsemiconductor system, or a propagation medium for data transmission, forexample for downloading the code over the Internet. Alternatively, thecomputer readable medium could take the form of a physical computerreadable medium such as semiconductor or solid state memory, magnetictape, a removable computer diskette, a random access memory (RAM), aread-only memory (ROM), a rigid magnetic disc, and an optical disk, suchas a CD-ROM, CD-RAN or DVD.

An apparatus such as a computer may be configured in accordance withsuch code to perform one or more processes in accordance with thevarious methods discussed herein. Such an apparatus may take the form ofa data processing system. Such a data processing system may be adistributed system. For example, such a data processing system may bedistributed across a network.

1-22. (canceled)
 23. A method for detection of a cyber-threat to acomputer system, the method arranged to be performed by one or moreprocessing apparatuses, the method comprising: receiving input dataassociated with a first entity associated with the computer system and asecond entity associated with the computer system; deriving from thereceived input data metrics representative of characteristics of thereceived input data; analyzing the derived metrics using a firstself-learning model trained on a normal behavior of at least the firstentity; analyzing one or more causal links between data associated withthe first entity and data associated with the second entity; comparingthe analyzed metrics to parameters that correspond to the normalbehavior of at least the first entity; determining, in accordance withthe analyzed derived metrics and the causal link, a cyber-threat riskparameter indicative of a likelihood of a cyber-threat.
 24. The methodaccording to claim 1, further comprising: identifying behavior deviatingfrom a normal behavior of at least the first entity, where the firstself-learning model trained on the normal behavior at least usesunsupervised learning; and building a chain of behavior including two ormore causal links between data identifying the behavior deviating fromthe normal behavior associated with the first entity and data associatedwith the second entity to detect the cyber-threat.
 25. The methodaccording to claim 1, further comprising: where the modelled behavior inthe first model of the first entity includes at least one of detecting achange in a pattern in any of 1) information, 2) activity, or a 3)combination of both in the computer system in order to be able to detectboth a change in behavior of a user using the computing system as wellas a change in behavior of a device in the computing system, where thefirst entity is the device in the computing system and the device'sbehavior is compared to the normal behavior of at least the firstentity, and the second entity is the user of the computing system andthe user's activities are compared to a normal behavior of at least thesecond entity.
 26. The method according to claim 1, further comprising:predicting an expected behavior of the first entity of the computingsystem based on the first self-learning model trained on normalbehavior; and wherein determining the cyber-threat risk parametercomprises comparing the analyzed, derived metrics with the predictedexpected behavior and comparing whether parameters of the analyzed,derived metrics fall outside the parameters set by a threat parameterbenchmark.
 27. The method according to claim 1, where a normal behaviorthreshold is used by the first model as a moving benchmark of parametersthat correspond to a normal pattern of life for the computing system.28. The method according to claim 27, where the first self-learningmodel of normal behavior is updated when new input data is received thatis deemed within the limits of normal behavior, where a normal behaviorthreshold is used by the first self-learning model as a moving benchmarkof parameters that correspond to a normal pattern of life for the firstentity, and the normal behavior threshold is varied according to theupdated changes in the computer system allowing the model to spotbehavior on the computing system that falls outside the parameters setby the moving benchmark.
 29. The method according to claim 1, furthercomprising: using a second self-learning model trained on a normalbehavior of at least the second entity to determine what is normalbehavior for at least the second entity, using a third model trained toanalyze data for detecting a first type of threat, and using a fourthmodel trained to analyze data for detecting a second type of threat. 30.The method according to claim 1, wherein the cyber-threat risk parameteris a probability of the likelihood of a threat determined using arecursive Bayesian estimation.
 31. The method according to claim 30,further comprising: dynamically assigning recurring time cycles to anormal behavior threshold.
 32. A non-transitory computer readable mediumcomprising computer readable code operable, when executed by one or moreprocessing apparatuses in the computer system to instruct a computingdevice to perform the method of claim
 1. 33. A threat detection system,comprising: at least one or more ports configured to receive input dataassociated with a first entity associated with a computer system and asecond entity associated with the computer system; a non-transitorymemory configured to store a first self-learning model trained on anormal behavior of at least the first entity and a computer readablecode; and one or more processors configured to execute the computerreadable code to derive from the received input data metricsrepresentative of characteristics of the received input data, to analyzethe derived metrics using a first self-learning model trained on anormal behavior of at least the first entity, to analyze one or morecausal links between data associated with the first entity and dataassociated with the second entity, to comparing the analyzed metrics toparameters that correspond to the normal behavior of at least the firstentity, and to determine, in accordance with the analyzed derivedmetrics and the causal link, a cyber-threat risk parameter indicative ofa likelihood of a cyber-threat.
 34. The threat detection system to claim11, wherein the processor is further configured to execute the computerreadable code to identify behavior deviating from the normal behavior ofat least the first entity and to build a chain of behavior applying atleast two or more causal links to detect the cyber-threat.
 35. Thethreat detection system to claim 11, wherein the processor is furtherconfigured to execute the computer readable code to analyze the derivedmetrics using a second self-learning model trained on a normal behaviorof at least the second entity, to use a third self-learning modeltrained on a first type of threat, and to use a third self-learningmodel trained on a second type of threat.
 36. The threat detectionsystem to claim 11, wherein the processor is further configured toexecute the computer readable code to predict an expected behavior ofthe first entity of the computing system based on the firstself-learning model trained on normal behavior; and wherein determiningthe cyber-threat risk parameter comprises comparing the analyzed,derived metrics with the predicted expected behavior and comparingwhether parameters of the analyzed, derived metrics fall outside theparameters set by a threat parameter benchmark.
 37. The threat detectionsystem to claim 11, wherein a normal behavior threshold is used by thefirst model as a moving benchmark of parameters that correspond to anormal pattern of life for the computing system, where the firstself-learning model of normal behavior is updated when new input data isreceived that is deemed within the limits of normal behavior.
 38. Thethreat detection system to claim 11, where the modelled behavior in thefirst model of the first entity includes at least one of detecting achange in a pattern in any of 1) information, 2) activity, or a 3)combination of both in the computer system in order to be able to detectboth a change in behavior of a user using the computing system as wellas a change in behavior of a device in the computing system, where thefirst entity is the device in the computing system and the device'sbehavior is compared to the normal behavior of at least the firstentity, and the second entity is the user of the computing system andthe user's activities are compared to a normal behavior of at least thesecond entity.
 39. The threat detection system to claim 11, wherein thederived metrics are network traffic related metrics associated withactivity of the first entity on the computer system reflecting a usageof the computer system by the first entity over a period of time. 40.The threat detection system to claim 11, wherein the cyber-threat riskparameter is a probability of the likelihood of a threat determinedusing a recursive Bayesian estimation, and where the processor isfurther configured to execute the computer readable code to dynamicallyassign recurring time cycles to a normal behavior threshold.
 41. Thethreat detection system to claim 11, wherein results of the cyber-threatrisk parameter are projected on a 3D graphical user interface thatconveys cyber threats across a packet flow and connection topologycorresponding to the computing system.
 42. A network, comprising: atleast one network switch; multiple computing devices operable by usersof the network; a threat detection system that includes at least one ormore ports configured to receive input data associated with a firstentity associated with a computer system and a second entity associatedwith the computer system; a non-transitory memory configured to store afirst self-learning model trained on a normal behavior of at least thefirst entity and a computer readable code; and a processor configured toexecute the computer readable code to derive from the received inputdata metrics representative of characteristics of the received inputdata, to analyze the derived metrics using a first self-learning modeltrained on a normal behavior of at least the first entity, to analyzeone or more causal links between data associated with the first entityand data associated with the second entity, and to determine, inaccordance with the analyzed derived metrics and the causal link, acyber-threat risk parameter indicative of a likelihood of acyber-threat; and wherein the threat detection system leverages animprovement in the device by identifying cyber-threats to improveperformance by the target device by containing the detected threat andminimizing an amount of CPU cycles, memory space, and power consumed bythat detected threat in the network entity when the detected threat iscontained by the initiated actions.